109 research outputs found

    Heterofusion: Fusing genomics data of different measurement scales

    Get PDF
    In systems biology, it is becoming increasingly common to measure biochemical entities at different levels of the same biological system. Hence, data fusion problems are abundant in the life sciences. With the availability of a multitude of measuring techniques, one of the central problems is the heterogeneity of the data. In this paper, we discuss a specific form of heterogeneity, namely, that of measurements obtained at different measurement scales, such as binary, ordinal, interval, and ratio-scaled variables. Three generic fusion approaches are presented of which two are new to the systems biology community. The methods are presented, put in context, and illustrated with a real-life genomics example

    Breast adipocyte size associates with ipsilateral invasive breast cancer risk after ductal carcinoma in situ

    Get PDF
    Although Ductal Carcinoma In Situ (DCIS) is a non-obligate precursor to ipsilateral invasive breast cancer (iIBC), most DCIS lesions remain indolent. Hence, overdiagnosis and overtreatment of DCIS is a major concern. There is an urgent need for prognostic markers that can distinguish harmless from potentially hazardous DCIS. We hypothesized that features of the breast adipose tissue may be associated with risk of subsequent iIBC. We performed a case-control study nested in a population-based DCIS cohort, consisting of 2,658 women diagnosed with primary DCIS between 1989-2005, uniformly treated with breast conserving surgery (BCS) alone. We assessed breast adipose features with digital pathology (HALO®, Indica Labs) and related these to iIBC risk in 108 women that developed subsequent iIBC (cases) and 168 women who did not (controls) by conditional logistic regression, accounting for clinicopathological and immunohistochemistry variables. Large breast adipocyte size was significantly associated with iIBC risk (Odds Ratio (OR) 2.75, 95% confidence interval (95%CI)= 1.25 to 6.05). High Cyclooxygenase (COX)-2 protein expression in the DCIS cells was also associated with subsequent iIBC (OR 3.70 (95%CI= 1.59 to 8.64). DCIS with both high COX-2 expression and large breast adipocytes was associated with a 12-fold higher risk (OR 12.0, 95%CI= 3.10 to 46.3, P</p

    Computational pan-genomics: Status, promises and challenges

    Get PDF
    Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations

    Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference

    No full text
    An important feature of Bayesian statistics is the opportunity to do sequential inference: The posterior distribution obtained after seeing a dataset can be used as prior for a second inference. However, when Monte Carlo sampling methods are used for inference, we only have a set of samples from the posterior distribution. To do sequential inference, we then either have to evaluate the second posterior at only these locations and reweight the samples accordingly, or we can estimate a functional description of the posterior probability distribution from the samples and use that as prior for the second inference. Here, we investigated to what extent we can obtain an accurate joint posterior from two datasets if the inference is done sequentially rather than jointly, under the condition that each inference step is done using Monte Carlo sampling. To test this, we evaluated the accuracy of kernel density estimates, Gaussian mixtures, mixtures of factor analyzers, vine copulas and Gaussian processes in approximating posterior distributions, and then tested whether these approximations can be used in sequential inference. In low dimensionality, Gaussian processes are more accurate, whereas in higher dimensionality Gaussian mixtures, mixtures of factor analyzers or vine copulas perform better. In our test cases of sequential inference, using posterior approximations gives more accurate results than direct sample reweighting, but joint inference is still preferable over sequential inference whenever possible. Since the performance is case-specific, we provide an R package mvdens with a unified interface for the density approximation methods.Pattern Recognition and Bioinformatic

    Mouse models in the era of large human tumour sequencing studies

    No full text
    Cancer is a complex disease in which cells progressively accumulate mutations disrupting their cellular processes. A fraction of these mutations drive tumourigenesis by affecting oncogenes or tumour suppressor genes, but many mutations are passengers with no clear contribution to tumour development. The advancement of DNA and RNA sequencing technologies has enabled in-depth analysis of thousands of human tumours from various tissues to perform systematic characterization of their (epi)genomes and transcriptomes in order to identify (epi)genetic changes associated with cancer. Combined with considerable progress in algorithmic development, this expansion in scale has resulted in the identification of many cancer-associated mutations, genes and pathways that are considered to be potential drivers of tumour development. However, it remains challenging to systematically identify drivers affected by complex genomic rearrangements and drivers residing in non-coding regions of the genome or in complex amplicons or deletions of copy-number driven tumours. Furthermore, functional characterization is challenging in the human context due to the lack of genetically tractable experimental model systems in which the effects of mutations can be studied in the context of their tumour microenvironment. In this respect, mouse models of human cancer provide unique opportunities for pinpointing novel driver genes and their detailed characterization. In this review, we provide an overview of approaches for complementing human studies with data from mouse models. We also discuss state-of-the-art technological developments for cancer gene discovery and validation in mice
    • …
    corecore